Czech Monolingual Information Retrieval Using Off-The-Shelf Components - the University of West Bohemia at CLEF 2007 Ad-Hoc track

نویسندگان

  • Pavel Ircing
  • Ludek Müller
چکیده

The paper provides a brief description of the system assembled for the CLEF 2007 Ad-Hoc track by the University of West Bohemia. We have performed only monolingual experiments (Czech documents Czech queries) using two incarnations of the tf.idf model — one with raw term frequency and the other with the BM25 term frequency weighting — as implemented in the Lemur toolkit. The effect of the blind relevance feedback was also explored. Czech morphological analyser and tagger were used for lemmatization and stop word removal. The results achieved seem to be quite reasonable, with MAP ranging from 0.11. to 0.30.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Attempts to Search Czech Spontaneous Spoken Interviews - the University of West Bohemia at CLEF 2007 CL-SR track

The paper presents an overview of the system build and experiments performed for the CLEF 2007 CL-SR track by the University of West Bohemia. We have concentrated on the monolingual experiments using the Czech collection only. The approach that was successfully employed by our team in the last year's campaign (simple tf.idf model with blind relevance feedback, accompanied with solid linguistic ...

متن کامل

Charles University at CLEF 2007 Ad-Hoc Track

In this paper we describe retrieval experiments performed at Charles University in Prague for participation in the CLEF 2007 Ad-Hoc track. We focused on the Czech monolingual task and used the LEMUR toolkit as the retrieval system. Our results demonstrate that for Czech as a highly inflectional language, lemmatization significantly improves retrieval results and manually created queries are onl...

متن کامل

MIRACLE Progress in Monolingual Information Retrieval at Ad-Hoc CLEF 2007

This paper presents the 2007 MIRACLE’s team approach to the AdHoc Information Retrieval track. The main work carried out for this campaign has been around monolingual experiments, in the standard and in the robust tracks. The most important contributions have been the general introduction of automatic named-entities extraction and the use of wikipedia resources. For the 2007 campaign, runs were...

متن کامل

Benefit of Proper Language Processing for Czech Speech Retrieval in the CL-SR Task at CLEF 2006

The paper describes the system built by the team from the University of West Bohemia for participation in the CLEF 2006 CL-SR track. We have decided to concentrate only on the monolingual searching in the Czech test collection and investigate the effect of proper language processing on the retrieval performance. We have employed the Czech morphological analyser and tagger for that purposes. For...

متن کامل

The University of West Bohemia at CLEF 2006, the CL-SR Track

The paper describes the system build by the team from the University of West Bohemia for participation in the CLEF 2006 CL-SR track. We have decided to concentrate only on the monolingual searching in the Czech test collection. We have employed the Czech morphological analyser and tagger in order to perform necessary linguistic preprocessing (lemmatization and stop-word removal). As for the act...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007